A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models.
نویسندگان
چکیده
When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be stably estimated via bootstrap resampling. The IPEC has the advantage of being applicable for models and fitting techniques where the PLL is not available. To investigate the performance of this new criterion, a simulation study is carried out, mimicking microarray survival data. Additionally, model selection by predictive PLL, estimated via bootstrap resampling instead of cross-validation, is examined. It is seen that this mostly results in similar prediction performance of the selected models, compared to estimates based on cross-validation. Model selection by bootstrap estimates of the IPEC performs about as well as selection by cross-validation estimates of the PLL. Therefore, it is expected to be a reasonable alternative in cases where there is no PLL. Similar results are seen in the analysis of a microarray survival data set from patients with diffuse large-B-cell lymphoma.
منابع مشابه
A New High-order Takagi-Sugeno Fuzzy Model Based on Deformed Linear Models
Amongst possible choices for identifying complicated processes for prediction, simulation, and approximation applications, high-order Takagi-Sugeno (TS) fuzzy models are fitting tools. Although they can construct models with rather high complexity, they are not as interpretable as first-order TS fuzzy models. In this paper, we first propose to use Deformed Linear Models (DLMs) in consequence pa...
متن کاملPrediction of ultimate strength of shale using artificial neural network
A rock failure criterion is very important for prediction of the ultimate strength in rock mechanics and geotechnics; it is determined for rock mechanics studies in mining, civil, and oil wellborn drilling operations. Also shales are among the most difficult to treat formations. Therefore, in this research work, using the artificial neural network (ANN), a model was built to predict the ultimat...
متن کاملAn unbiased Cp criterion for multivariate ridge regression
Mallows’ Cp statistic is widely used for selecting multivariate linear regression models. It can be considered to be an estimator of a risk function based on an expected standardized mean square error of prediction. Fujikoshi and Satoh (1997) have proposed an unbiased Cp criterion (called modified Cp; MCp) for selecting multivariate linear regression models. In this paper, the unbiased Cp crite...
متن کاملمقایسه مدل های غیرخطی برای توصیف منحنی رشد از تولد تا یکسالگی در بز مرخز
The objective of this study was to select the best model among five non-linear growth functions, i.e., Brody, Gompertz, Logistic, Von Bertalanffy and Negative exponential for describing the growth curve in Markhoz goat. The data included 5557 body weight records of goats from birth to yearling which were collected during 2006 to 2013 at Sanandaj Research Station. Growth curve parameters (A, B, ...
متن کاملEnsemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search
In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistics in medicine
دوره 29 7-8 شماره
صفحات -
تاریخ انتشار 2010